A Generic Sentence Trimmer with CRFs
نویسنده
چکیده
The paper presents a novel sentence trimmer in Japanese, which combines a non-statistical yet generic tree generation model and Conditional Random Fields (CRFs), to address improving the grammaticality of compression while retaining its relevance. Experiments found that the present approach outperforms in grammaticality and in relevance a dependency-centric approach (Oguro et al., 2000; Morooka et al., 2004; Yamagata et al., 2006; Fukutomi et al., 2007)− the only line of work in prior literature (on Japanese compression) we are aware of that allows replication and permits a direct comparison.
منابع مشابه
Citation Handling: Processing Citation Texts in Scientific Documents
Title of thesis: CITATION HANDLING: PROCESSING CITATION TEXTS IN SCIENTIFIC DOCUMENTS Michael Alan Whidby Master of Science, 2012 Thesis directed by: Professor Bonnie Dorr Dr. David Zajic Department of Computer Science Citation sentences (sentences that cite other papers) play a key role in the summarization of scientific articles. However, a citation-based summarization system that depends on ...
متن کاملA Sentence-Trimming Approach to Multi-Document Summarization
We implemented an initial application of a sentence-trimming approach (Trimmer) to the problem of multi-document summarization in the MSE2005 and DUC2005 tasks. Sentence trimming was incorporated into a feature-based summarization system, called MultiDocument Trimmer (MDT), by using sentence trimming as both a preprocessing stage and a feature for sentence ranking. We demonstrate that we were a...
متن کاملSentence Compression as a Component of a Multi-Document Summarization System
We applied a single-document sentencetrimming approach (Trimmer) to the problem of multi-document summarization. Trimmer was designed with the intention of compressing a lead sentence into a space consisting of tens of characters. In our Multi-Document Trimmer (MDT), we use Trimmer to generate multiple trimmed candidates for each sentence. Sentence selection is used to determine which trimmed c...
متن کاملAdding Redundant Features for CRFs-based Sentence Sentiment Classification
In this paper, we present a novel method based on CRFs in response to the two special characteristics of “contextual dependency” and “label redundancy” in sentence sentiment classification. We try to capture the contextual constraints on sentence sentiment using CRFs. Through introducing redundant labels into the original sentimental label set and organizing all labels into a hierarchy, our met...
متن کاملSentence boundary detection using sequential dependency analysis combined with CRF-based chunking
In spoken language, sentence boundaries are much less explicit than in written language. Since conventional natural language processing (NLP) techniques are generally designed assuming the sentence boundaries are already given, it is crucial to detect the boundaries accurately for applying such NLP techniques to spoken language. Classification frameworks, such as Support Vector Machines (SVMs) ...
متن کامل